updated: 2022-05-12_09:35:45-04:00
Algorithms
Thursdays one minute before midnight
25% off monday
Introduction
Three Goals:
- learn how to measure, test, and analyze the effectiveness of algorithm, program, or data structure
- present commonly used data structures
- reinforce the idea of tradeoffs
Efficiency
- an efficient solution is one that solves the problem within the stated resource constraints
- the cost of a solution is the amount of resources consumed (space and time)
Problem
- a problem is a task to be performed subject to resource constraints
- the problem definition should not contain any constraints on how the problem should be solved
- problems can be viewed as a specification of inputs and outputs like a math function
- a specific set of inputs is an instance of the problem
Algorithm
- an algorithm is a process followed to solve a problem
- Properties:
- Correctness: the algorithm implements the intended function
- It is composed of concrete steps. concrete: each step is well defined and doable
- The determination of the next step is unambiguous
- The number of steps is finite
- It must terminate
Program
- a program is an instantiation of an algorithm
Data Structure
- a data structure is a collection of data items with a particular organization and operations
- it is always possible to process data in structure
- using a suitable one can mean the difference between seconds and days in processing
- in order to choose one we need to analyze the problem to determine performance goals
- Analyze the problem to determine the operations that must be supported (add/delete/search)
- Quantify resource constraints for each operation
- Select the data structure that meets the constraints
- How often will the data be searched?
- Can data be deleted?
- Each structure has costs and benefits
Data Types
- a representation (set of values) together with operations
- Values + Operations
Abstract Data Type
- The implementation of a logical data type as a software component with a physical form
- Example: lists, stacks, queues
- ADT does not reveal how the data type is implemented
- data structure is used to implement the ADT
Putting the Science in Computer Science
- Science or Engineering?
- Hypothesis not just a prediction, must be testable, verifiable, and testable
Proofs
- A proof is an argument
- structure, completion
- state what you will prove, then state what you proved
Direct
- shows a conditional statement p -> q
- if p is true q is also tru
- combination p true and q false never happens
- Usually presented in paragraph form
- Outline:
- State assumption "we assume p is true"
- State what we are prove
- Present the steps of your argument
- State what you proved
Contradiction
Is we want to prove p is true, and we can find a contradiction, q, such that not p -> q is true
because q is false
- State the proposition
- Suppose $\neg$ p is true, and state what that means
- Set up the contradiction
- Show the contradiction
- State what was proven
Mathematical Induction
- Used to prove statements that assert P(n) is true for all positive integers n, where P(n) is a propositional function
- State what you are proving
- Basis: Verify P(1) is true
- Induction assumption (inductive hypothesis): assume P(k) is true for some arbitrary positive integer k (pick a k, not all k's)
- Inductive step: show that P(k) -> P(k+1) is true for all positive integers k
- State what you proved
(alternate)
- State what you will prove
- State what parameter you are doing the induction on
- Decide the base case for the parameter, show that the theorem is true for this base case
- State your induction assumption bounded from below by you base case\
- State what you are going to prove in inductive step
- Start with the new case and restate it in terms of your induction assumption from step 4
- State what you proved
(Example)
Prove: The number of empty subtrees in a non-empty binary tree is one more than the number of nodes in the tree
- State what you are proving: We must prove the theorem that the number of empty subtrees in a non-empty binary tree is one more than the number of nodes in the tree.
- State what parameter you are doing the induction on: Proof by induction on n, the number of nodes in the tree
- Decide on the base case for the parameter and show that the theorem is true for this base case: Base case n = 1. A binary tree with 1 node has 2 empty subtrees. The theorem is true for the base case
- State your induction assumption (hypothesis) bounded from below by your base case: Assume that the theorem is true for a tree T with k nodes, where k >= 1. That is, tree T has k nodes and k+1 empty subtrees
- State what you are going to prove: We must show that the theorem is true for a tree T' with k+1 nodes. That is T' has k+2 empty subtrees.
- Start with the new case and restate it in terms of your induction assumption from step 3: Consider T' with k+1 nodes. If we remove a node from T', we now have a tree T with k nodes. By our induction assumption, this tree has k+1 empty subtrees. Now, add a node onto tree T, thus restoring tree T'. This removes an empty subtree of T, but adds on 2 more empty subtrees to the restored node, so T' has a total of k+1-1+2 empty subtrees (k+2) ... and k+1 nodes.
- State what you proved: Thus, we have shown that a tree of n nodes has n + 1 empty subtrees. Therefore, the theorem is true by process of mathematical induction
Recurrence Relations
What is Recursion?
Induction vs. Recursion
T(n) = ...
- Induction works from the base case, recursion works to the base case
- A recurrence relation for the sequence {$a_n$} is an equation that expresses $a_n$ in terms of one or more of the previous terms of the sequence, namely a0, a1, ..., an-1 for all integers n with n>=n0 where n0 is a nonnegative integer
- A sequence is called a solution of a recurrence relation if its terms satisfy the recurrence relation
Counting Problems
- eg Time cost of an algorithm..
- Simple counting methods:
- Summations (sum for function)
- Permutations (ordered arrangement of a distinct set)
- Combinations
Modelling with Recurrence Relations
A pair of rabbits 1m1f placed on island
Rabbits don't breed until 2months
after, each produce another pair of rabbits each month
recurrence relation for number of rabbit pairs on island after n months, assuming none die$f_{n}$ = number of pairs after n months
$f_{1}$ = 1
$f_{2}$ = 1
$f_{n} = f_{n-1} + f_{n-2}$Fibonacci Sequence
Solving Recurrence Relations
Solving a recurrence relation is finding a closed form of a recurrence relation
- Closed form is an algebraic equation with the same value as the recurrence relation
Backwards Substitution/Iteration (only for simplest sequences with only one term)
- Replace occurrence of T on rhs with it's definition
- repeatedly substitute until you see a pattern
- write a formula for T(n) in terms of n and the number of substitutions, which we commonly call k
- Goal: detect a pattern that will let us rewrite the recurrence in terms of summation (expand recurrence as many steps as you need to see the pattern)
- Ideally, we end up with a summation we can give in a closed form
- We have not proven we have the correct closed form -- just that we guessed the pattern. We need to use an induction proof
Example 1:
T(n) = T(n-1) + 1 ; T(1) = 0a. expand
K = 1
T(n) = T(n-1) + 1
> T(n-1) = T(n-1-1) + 1 = T(n-2) + 1
T(n) = T(n-2) + 1 + 1
T(n) = T(n-2) + 2
K = 2
T(n) = T(n-2) + 2
> T(n-2) = T(n-2-1) + 1 = T(n-3) + 1
T(n) = T(n-3) + 2 + 1
K = 3
T(n) = T(n-3) + 3
> T(n-3) = T(n-3-1) + 1 = T(n-4) + 1b. generalize using n + k
T(n) = T(n-k) + kc. use base case to define k
T(1), k = ?
n-k = 1 (we are setting this based on what we know of base case)
T(n-(n-1)) = T(1)d. substitute new definition of k into general RR form
T(n) = T(n-(n-1)) + (n-1)
T(n) = T(1) + (n-1)
T(n) = n-1
Example 1 Proof of Correctness:
TBP:
The closed form for the recurrence relation
T(n) = T(n-1) + 1 ; T(1) = 0
is T(n) = n-1 for n $\geq$ 1Proof by Mathematical Induction on n
Basis: (use solution)
T(1)=1-1=0 $\checkmark$Inductive Hypothesis:
Suppose T(k)=k-1 for some k $\geq$ 1Inductive Step:
We must show that the statement is true for T(k+1).
That is..
T(k+1)=T(k+1-1)+1
T(k+1)=T(k)+1
now we want the rhs to reflect the original rr
By our inductive hypothesis T(k)=k-1
T(k+1)=(k-1)+1
T(k+1)=k
which is true since T(k)=k-1
$\therefore$ T(n)=n-1 has been shown true and the closed form T(n)=n-1 is correct for the RR T(n)=T(n-1)+1,T(1)=0 for n $\geq$ 1
Example 2:
T(n) = T(n-1) + n ; T(1) = 1a. expand
K = 1
T(n) = T(n-1) + n
> T(n-1) = T((n-1)-1)+(n-1) = T(n-2) + (n-1)
K = 2
T(n) = T(n-2) + (n-1) + n
> T(n-2) = T((n-2)-1) + (n-2) = T(n-3) + (n-2)
K = 3
T(n) = T(n-3) + (n-2) + (n-1) + nb. generalize
T(n) = 1 + ... + (n-2) + (n-1) + n
T(n) = 1 + 2 + 3 + ... + n$\sum_{i=1}^{n}i = \frac{n(n+1)}{2}$
Example 3:
T(n) = 2T(n-1) + 1; T(1) = 1a. expand
k=1
T(n)=2T(n-1)+1
> T(n-1)=2T(n-2)+1
T(n)=2(2T(n-2)+1)+1k=2
T(n)=2$^2$T(n-2)+2+1
> T(n-2)=2T(n-3)+1
T(n)=2$^2$(2T(n-3)+1)+2+1k=3
T(n)=2$^3$T(n-3)+4+2+1
T(n)=2$^3$T(n-3)+$2^2+2^1+2^0$b. generalize
T(n)= 2$^k$T(n-k)+$2^{k-1}+2^{k-2}+\dots+2^{2}+2^{1}+2^0$
let k=n-1
T(n)=2$^k$T(n-(n-1))+$2^{k-1}+2^{k-2}+\dots+2^{2}+2^{1}+2^0$
T(n)=2$^k$T(1)+$2^{k-1}+2^{k-2}+\dots+2^{2}+2^{1}+2^0$
T(n)=2$^k$+$2^{k-1}+2^{k-2}+\dots+2^{2}+2^{1}+2^0$c. put in terms of n
T(n)=2$^{n-1}$+$2^{n-2}+2^{n-3}+\dots+2^{2}+2^{1}+2^0$
T(n)=$\sum_{i=0}^{n-1}2^{i}$d. known closed form
T(n)=$\sum_{i=0}^{n-1}2^{i}=2^{(n-1)+1}-1=2^{n}-1$$\therefore$ T(n)=$2^{n}$-1
Example 3 Proof of Correctness:
TBP: The closed form of T(n) = 2T(n-1)+1; T(1) = 1
is T(n)=$2^{n}$-1Proof by Mathematical Induction on n:
Basis:
n = 1; T(1) = 2$^1$-1 = 1 $\checkmark$Inductive Hypothesis:
Suppose T(k)=2$^k$-1 is true for some k $\geq$ 1Inductive Step:
Need to show that T(k+1) $\stackrel{?}{=}$ 2T((k+1)-1)+1
T(k+1) $\stackrel{?}{=}$ 2T(k)+1
T(k+1) $\stackrel{?}{=}$ 2(2$^k$-1)+1
T(k+1) $\stackrel{?}{=}$ 2$^{k+1}$-2+1
T(k+1) $\stackrel{?}{=}$ 2$^{k+1}$-1$\therefore$ CF is equivalent
$\therefore$ True, T(n)=$2^n$-1 for n $\geq$ 1
Asymptotic Analysis
- measure efficiency of an algorithm as input size grows
- estimation technique
- Time & Space
Known Time Costs
T(n) = C
static int largest(int[] A){
int first = A[0];
return first;
}
T(n) = n
T(n) = n^2
int foo(int n){
int sum = 0;
for (int i = 1; i <= n; i++){ // n
for (int j = 1; j <= n; j++){ // n
sum++
}
}
}
Big O, Big Omega, Big Theta
$\Omega$ (f(n)) : lower bound
$\mathbb{O}$ (f(n)) : upper bound
$\Theta$ (f(n)) : when they meet with a constant factor
cases deal with particular size of input
Lower bound is maximum lower bound
Space Requirements
- If an array of n integer requires c bytes per n, $\Theta$(n)
- Overhead: need to store extra information about the data structure itself
- Lookup table: using space to reduce computations
Polynomial Time
-
we can get an exact solution to these problems within some 'reasonable' amount of time
-
O(n^k)
-
considered tractable
- a solution may exist, but it would take a prohibitively long time to compute
-
A hard problem is one where the lower bound is defined by $\Omega$(c$^n$) (exponential)
Non-determinism
- a Non-deterministic computer is one that can...
- 'guess' the correct solutions to a problem from among all possible solutions
- OR widely parallel computer that can simultaneously test all possible solutions (must be in polynomial time)
- A Non-deterministic algorithm is one that runs on a non-deterministic machine in polynomial time
- a problem in NP: Non-deterministic Polynomial
- Solving?
- Two step process
-
- non deterministic process that generates a possible solution
-
- look at outcome of the first step and decide if it's a real solution to the problem
-
- Two step process
Sorting
- Sorting is NP
- $P\subseteq NP$
- Problems in P like sorting have a deterministic algorithm
NP Complete problems
- If we can find a deterministic polynomial solution to one NP complete problem, we have a solution to all of the NP complete problems
- Used to describe the hardest problems of NP
- Problem reduction: reduce a known problem to an unknown problem.
- Proof of NP completeness:
- show problem is in NP
- show it is NP complete by reducing a known NP Complete problem to your problem
Data Structures
Lists
- Finite, ordered
- ADT Operations:
- clear
- insert
- append
- remove
- movetostart
- movetoend
- previous
- next
- length
- currpos
- move
- getvalue
List implementation comparison
- n number of elements
- P size of pointer
- E size of an element in storage unit
- D max number of elements that can be store
- Array Based: DE
- LL : n(P+E)
- when n > DE/(P+E) array becomes more efficient
- if P = E break even point is D/2
Generalizations:
if we don't know the size of the data, LL is probably the way to go
arraybased lists are more space efficient if the size is known and stable
array provides faster access by position
singly linked lists provide sequential access (O(n))
insertion/deletion in LL is O(1), but getting to the node is most likely O(n)
insertion/deletion in Array is O(n), but getting to the node is O(1)
- Double linked lists
Lists, Stacks, Queues
- List: most generalized
- adding and removing can happen anywhere
- Stack:
- adding and removing can happen in one place, the top of the list
- Push, Pop, Top
- Queue:
- adding happens at tail, removing happens in front (FIFO)
Search & Sort are only supported by list
Array based stack
- push/pop O(n)
- if n+1, O(1)
Linked Stack
- top pointer, just needs to be updated
For both, operations are O(1)
Deque
- queue where you can add and remove from the front and back
Array Based Queue
-
circular is cool
-
array vs linked queue
- both provide constant time for enqueue and dequeue
- space are the same as for linked list
- avoid arrays
Dictionaries
- not a data structure, but an ADT
- interface to a collection to store and retrieve data efficiently
- key/value
- implementations:
- array based list unsorted
- unsorted, inserting is constant
- finding and removing is $\Theta$(n) (sequential search)
- Linked
- same cost as array
- Sorted Array
- finding is $\Theta$(logn)
- insert & remove now is $\Theta$(n)
- array based list unsorted
Cutoff for midterm
Search
- Exact match: for a specific pair
- Range query: for all records in range
Three general approaches:
- Sequential
- Hashing
- Tree Indexing
- (sort first & binary sort)